Many studies have been conducted on seeking the efficient solution forsubgraph similarity search over certain (deterministic) graphs due to its wideapplication in many fields, including bioinformatics, social network analysis,and Resource Description Framework (RDF) data management. All these worksassume that the underlying data are certain. However, in reality, graphs areoften noisy and uncertain due to various factors, such as errors in dataextraction, inconsistencies in data integration, and privacy preservingpurposes. Therefore, in this paper, we study subgraph similarity search onlarge probabilistic graph databases. Different from previous works assumingthat edges in an uncertain graph are independent of each other, we study theuncertain graphs where edges' occurrences are correlated. We formally provethat subgraph similarity search over probabilistic graphs is #P-complete, thus,we employ a filter-and-verify framework to speed up the search. In thefiltering phase,we develop tight lower and upper bounds of subgraph similarityprobability based on a probabilistic matrix index, PMI. PMI is composed ofdiscriminative subgraph features associated with tight lower and upper boundsof subgraph isomorphism probability. Based on PMI, we can sort out a largenumber of probabilistic graphs and maximize the pruning capability. During theverification phase, we develop an efficient sampling algorithm to validate theremaining candidates. The efficiency of our proposed solutions has beenverified through extensive experiments.
展开▼